Improving the Naive Bayes Classifier via a Quick Variable Selection Method Using Maximum of Entropy

نویسندگان

Joaquín Abellán

Francisco Javier García Castellano

چکیده

Variable selection methods play an important role in the field of attribute mining. The Naive Bayes (NB) classifier is a very simple and popular classification method that yields good results in a short processing time. Hence, it is a very appropriate classifier for very large datasets. The method has a high dependence on the relationships between the variables. The Info-Gain (IG) measure, which is based on general entropy, can be used as a quick variable selection method. This measure ranks the importance of the attribute variables on a variable under study via the information obtained from a dataset. The main drawback is that it is always non-negative and it requires setting the information threshold to select the set of most important variables for each dataset. We introduce here a new quick variable selection method that generalizes the method based on the Info-Gain measure. It uses imprecise probabilities and the maximum entropy measure to select the most informative variables without setting a threshold. This new variable selection method, combined with the Naive Bayes classifier, improves the original method and provides a valuable tool for handling datasets with a very large number of features and a huge amount of data, where more complex methods are not computationally feasible.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier

With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...

متن کامل

Accuracy comparison between gene selection methods using NAIVE Bayes classifier for the microarray data of JEV infected Mus Musculus brain cells

Japanese Encephalitis is the most important cause of epidemic encephalitis worldwide. From the reports by various sources, about 68,000 cases of Japanese encephalitis (JE) are estimated to occur each year [1]. A vaccine is available for Japanese encephalitis, which utilizes effectively killed inoculated bacteria, but it is expensive and requires a primary vaccination followed by two successive ...

متن کامل

Incremental Weighted Naive Bays Classifiers for Data Stream

A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes’ theorem with naive independence assumption. The explanatory variables (Xi) are assumed to be independent from the target variable (Y ). Despite this strong assumption this classifier has proved to be very effective on many real applications and is often used on data stream for supervised classification. The n...

متن کامل

A Survey Paper On Naive Bayes Classifier For Multi-Feature Based Text Mining

Text mining is variance of a field called data mining. To make unstructured data workable by the computer Text mining is used which is also referred as “Text Analytics”. Text categorization, also called as topic spotting is the task of automatically classifies a set of documents into groups from a predefined set. Text classification is an essential application and research topic because of incr...

متن کامل

Using Maximum Entropy For Sentence Extraction

A maximum entropy classi er can be used to extract sentences from documents. Experiments using technical documents show that such a classi er tends to treat features in a categorical manner. This results in performance that is worse than when extracting sentences using a naive Bayes classi er. Addition of an optimised prior to the maximum entropy classi er improves performance over and above th...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Entropy

دوره 19 شماره

صفحات -

تاریخ انتشار 2017

Improving the Naive Bayes Classifier via a Quick Variable Selection Method Using Maximum of Entropy

نویسندگان

چکیده

منابع مشابه

A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier

Accuracy comparison between gene selection methods using NAIVE Bayes classifier for the microarray data of JEV infected Mus Musculus brain cells

Incremental Weighted Naive Bays Classifiers for Data Stream

A Survey Paper On Naive Bayes Classifier For Multi-Feature Based Text Mining

Using Maximum Entropy For Sentence Extraction

عنوان ژورنال:

اشتراک گذاری